[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework#45834
[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework#45834gengliangwang wants to merge 13 commits into
Conversation
|
cc @panbingkun @pan3793 @itholic as well |
| def validateStatus(status: ShuffleOutputStatus, shuffleId: Int, partition: Int) : Unit = { | ||
| if (status == null) { | ||
| val errorMessage = s"Missing an output location for shuffle $shuffleId partition $partition" | ||
| // scalastyle:off line.size.limit |
There was a problem hiding this comment.
Can we split it into multiple lines? To avoid using // scalastyle:off line.size.limit
There was a problem hiding this comment.
If the log is not long, one line makes it easier to read. I suggest we allow both styles.
| */ | ||
| object LogKey extends Enumeration { | ||
| val APPLICATION_ID = Value | ||
| val EXECUTOR_ID = Value |
There was a problem hiding this comment.
Perhaps we need to first category by some businesses, and then sort them by alphabetically within each one.
There was a problem hiding this comment.
Also, we may need a rule, is it an abbreviation or a complete spelling? For example:
APPLICATION_ID OR APP_ID
Otherwise, I am more worried that this class will become very large soon and there will be some duplicate key meanings
There was a problem hiding this comment.
Created #45862
BTW, what do you mean by first category?
There was a problem hiding this comment.
I originally planned to categorize by category first, and then sort it in alphabetical order for the second level.
Let me give an example:
APPLICATION-ID
K8S1ID
MEMOSL_ID
MAX_SIZE
MIN_SIZE
If we only sort by alphabetically, we will get:
APPLICATION_ID
MAX_SIZE
MEMOSL_ID
MIN_SIZE
K8S_ID
It's a bit weird for me to see MEMOS_ID between MAX_SIZE and MIN-SIZE.
If we first classify by category at the first level and then by alphabetically at the second level, we will obtain
# ID
APPLICATION_ID
MEMOSL_ID
K8S_ID
# SHUFFLE Value
MAX_SIZE
MIN_SIZE
Just like:
I think as our log migration work progresses, this class will become more and more large. If only sort by alphabetically , it is not sure whether developers and the final log searcher can quickly find the LogKey they want.
There was a problem hiding this comment.
@panbingkun I see. We can have a secondary category later.
In the migration, we should use generic keys and try to control the number of keys, so that the logs are easier to be queried.
There was a problem hiding this comment.
Of course, its disadvantage is that we cannot use UT like #45857 to fully guarantee that it is written in alphabetical order. It requires some manual intervention.
There was a problem hiding this comment.
@panbingkun I see. We can have a secondary category later. In the migration, we should use generic keys and try to control the number of keys, so that the logs are easier to be queried.
Okay.
|
@panbingkun @pan3793 @HyukjinKwon Thanks for the review. |
…red logging framework ### What changes were proposed in this pull request? Migrate logError with variables of core module to structured logging framework. This is part2 which transforms the logError entries of the following API ``` def logError(msg: => String, throwable: Throwable): Unit ``` to ``` def logError(entry: LogEntry, throwable: Throwable): Unit ``` migration Part1 was in #45834 ### Why are the changes needed? To enhance Apache Spark's logging system by implementing structured logging. ### Does this PR introduce _any_ user-facing change? Yes, Spark core logs will contain additional MDC ### How was this patch tested? Compiler and scala style checks, as well as code review. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45890 from gengliangwang/coreError2. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
| if (numFailures(index) >= maxTaskFailures) { | ||
| logError("Task %d in stage %s failed %d times; aborting job".format( | ||
| index, taskSet.id, maxTaskFailures)) | ||
| logError(log"Task ${MDC(TASK_ID, index)} in stage ${MDC(STAGE_ID, taskSet.id)} failed " + |
There was a problem hiding this comment.
task id and task index are different things. Here it's task index. @gengliangwang
There was a problem hiding this comment.
You are right. I will find time to revisit all the usages of the TASK_ID log key.
There was a problem hiding this comment.
and TASK_ATTEMPT_ID should be the same as TASK_ID?
There was a problem hiding this comment.
@cloud-fan I created #46951
@pan3793 what do you mean by the same?
There was a problem hiding this comment.
There was a problem hiding this comment.
yes these two are the same
### What changes were proposed in this pull request? Correct misleading usage of log key TASK_ID from #45834 and #46739 ### Why are the changes needed? Provide more accurate log keys in the structure logging ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing GA tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46951 from gengliangwang/fixTaskId. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Migrate logError with variables of core module to structured logging framework. This is part1 which transforms the logError entries of the following API
to
Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
Yes, Spark core logs will contain additional MDC
How was this patch tested?
Compiler and scala style checks, as well as code review.
Was this patch authored or co-authored using generative AI tooling?
No